7 research outputs found

    Bump hunting with non-Gaussian kernels

    Full text link
    It is well known that the number of modes of a kernel density estimator is monotone nonincreasing in the bandwidth if the kernel is a Gaussian density. There is numerical evidence of nonmonotonicity in the case of some non-Gaussian kernels, but little additional information is available. The present paper provides theoretical and numerical descriptions of the extent to which the number of modes is a nonmonotone function of bandwidth in the case of general compactly supported densities. Our results address popular kernels used in practice, for example, the Epanechnikov, biweight and triweight kernels, and show that in such cases nonmonotonicity is present with strictly positive probability for all sample sizes n\geq3. In the Epanechnikov and biweight cases the probability of nonmonotonicity equals 1 for all n\geq2. Nevertheless, in spite of the prevalence of lack of monotonicity revealed by these results, it is shown that the notion of a critical bandwidth (the smallest bandwidth above which the number of modes is guaranteed to be monotone) is still well defined. Moreover, just as in the Gaussian case, the critical bandwidth is of the same size as the bandwidth that minimises mean squared error of the density estimator. These theoretical results, and new numerical evidence, show that the main effects of nonmonotonicity occur for relatively small bandwidths, and have negligible impact on many aspects of bump hunting.Comment: Published at http://dx.doi.org/10.1214/009053604000000715 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A test of mode existence with applications to multimodality

    No full text
    Modes, or local maxima, are often among the most interesting features of a probability density function. Given a set of data drawn from an unknown density, it is frequently desirable to know whether or not the density is multimodal, and various procedures have been suggested for investigating the question of multimodality in the context of hypothesis testing. Available tests, however, suffer from the encumbrance of testing the entire density at once, frequently through the use of nonparametric density estimates using a single bandwidth parameter. Such a procedure puts the investigator examining a density with several modes of varying sizes at a disadvantage. A new test is proposed involving testing the reality of individual observed modes, rather than directly testing the number of modes of the density as a whole. The test statistic used is a measure of the size of the mode, the absolute integrated difference between the estimated density and the same density with the mode in question excised at the level of the higher of its two surrounding antimodes. Samples are simulated from a conservative member of the composite null hypothesis to estimate p-values within a Monte Carlo setting. Such a test can be combined with the graphical notion of a "mode tree," in which estimated mode locations are plotted over a range of kernel bandwidths. In this way, one can obtain a procedure for examining, in an adaptive fashion, not only the reality of individual modes, but also the overall number of modes of the density. A proof of consistency of the test statistic is offered, simulation results are presented, and applications to real data are illustrated

    High order data sharpening for density estimation

    No full text
    It is shown that data sharpening can be used to produce density estimators that enjoy arbitrarily high orders of bias reduction. Practical advantages of this approach, relative to competing methods, are demonstrated. They include the sheer simplicity of the estimators, which makes code for computing them particularly easy to write, very good mean-squared error performance, reduced 'wiggliness' of estimates and greater robustness against undersmoothing

    New Terrain in the Mode Forest

    No full text
    The mode tree of Minnotte and Scott (1993) provides a valuable method of investigating features such as modes and bumps in a unknown density. By examining kernel density estimates for a range of bandwidths, we can learn a lot about the structure of a data set. Unfortunately, the basic mode tree can be strongly affected by small changes in the data, and gives no way to differentiate between important modes and those caused, for example, by outliers. The mode forest overcomes these difficulties by looking simultaneously at a large collection of mode trees, all based on some variation of the original data, by means such as resampling or jittering. The result is both visually appealing and informative

    The Bumpy Road to the Mode Forest

    No full text
    The mode tree of Minnotte and Scott (1993) provides a valuable method of investigating features such as modes and bumps in a unknown density. By examining kernel density estimates for a range of bandwidths, we can learn a lot about the structure of a data set. Unfortunately, the basic mode tree can be strongly affected by small changes in the data, and gives no way to differentiate between important modes and those caused, for example, by outliers. The mode forest overcomes these difficulties by looking simultaneously at a large collection of mode trees, all based on some variation of the original data, by means such as resampling or jittering. The result is both visually appealing and informative
    corecore